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I. INTRODUCTION 





Abstract— Gait is a behavioural biometric that does not require the subject’s 
collaboration as it can be captured at a distance. The Gait-based age estimation 
has extensive applications in surveillance, customer age estimation in shopping 
centres and malls for business intelligence purposes and age-constrained access 
control to places like liquor shops, etc. In this paper, we propose Gait-Net, a 
LeNet-50 inspired age classification Convolutional Neural Network (CNN) for 
Gait-based age estimation. We propose the application of a heat map filter on 
each Gait Energy Image (GEI), for the enhancement of age differentiating 
features in the GEI, subsequently followed by the sequential age group and age 
estimation CNN models. We addressed the inherent class imbalance problem 
induced by the non-availability of sufficient data for the elderly subjects, by using 
the image augmentation technique. We evaluated our model on the OU-ISIR 
Large Population Gait Database and the results confirmed its efficiency. 


estimation using gait analysis. They produced and the used 


Gait-based biometric identification has been extensively 
studied in the recent years for its comparative viability in 
certain environments over the physiological biometrics 
like fingerprints, facial recognition, iris recognition, etc |1- 
5 |. A Gait capturing setup can achieve its task even with a 
non-cooperative distant subject and a low-resolution 
camera setup. In addition to the individual identification, 
lately, numerous studies have extensively explored the 
Gait-based evaluation of attributes like gender, age group, 
ethnicity, age, etc [6-8 |. While the age estimation has been 
the primary focus for relatively more applications in visual 
surveillance, access control, forensics and criminal 
investigation. 


Earlier studies focussed on establishing a relationship 
between the Gait attributes like arm swing, leg stride, stride 
frequency, etc. and age of the subject. Davis [9] established 
a relationship between age and Gait attributes to 
differentiate between adult and child subjects. Abreu et al. 
[10] in part established the possibility of gait-based age 
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the representations of cyclic movements of limbs called the 
cyclo-grams as the input to the feature extraction phase. 
They were successful in creating a model that 
differentiated between a young and an elderly subject but 
couldn’t differentiate between the two genders. Ince et al. 
[11] studied the shape of the body as age determining 
construct by differentiating a child and an adult from their 
head to body proportion. Callisaya et al. [7] discovered that 
the gender of an person alters the relationship between the 
age of the person and their gait as they found a substantial 
relationship between gender and various Gait attributes. 


In conventional image processing or computer vision- 
based Gait-based age estimation models, a Gait descriptor 
serves as an input. The common Gait descriptors used in 
the literature are: Gait Image Contour, Gait Image 
Silhouette, and Gait Energy Image (GEI) as shown in 
figure 1. The better feature representation ability and 
effectiveness of the GEI qualifies it for the most widely 
used feature descriptor. It is a combined representation of 
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multiple Gait sequences superimposed on top of each other. 
The GEI captures both static features like shape of the body 
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as well as the dynamic features like the arm swing more 
comprehensively. 





Fig 1. Gait Image Contour, Gait Image Silhouette, and Gait Energy Image. 


In this paper we put forth Gait-based age estimation 
using LeNet-50 inspired Convolutional Neural Network. 
The depth of a deep learning model trained on a GEI based 
Gait image dataset is restricted by the intrinsic deficiency of 
features in the GEI, exceeding a certain number of layers in 


the CNN results in overfitting. To overcome this 
shortcoming, we propose a heatmap filtering of the GEIs for 
the feature enhancement purpose as shown in figure 2. The 
heatmap representation allows us to train a deeper CNN 
over the data set without overfitting the training set. 





Fig 2. GEI and Heatmap. 


The proposed LeNet-50 enthused CNN overcomes 
the problem of getting stuck in the local minima near the 
global minima as faced by a conventional CNN that is 
trained with a uniform learning rate. By ensuring an iterative 
decrease in the learning rate near the global minima 
facilitates its convergence at the global minima. A common 
shortcoming possessed by all the preceding studies has been 
the low prediction accuracy for the elder and child subjects, 
the bias which is induced by the scarcity of data for those 
age groups. To overcome this limitation, we used image 
augmentation technique to reduce the class imbalance 
problem in the dataset, thus improving age prediction 
accuracy for those particular age groups. The performance 
evaluation of our model was conducted on OU-ISIR Large 
Population Gait Database [12] , which is the largest Gait 
database with age developed till date, comprising of 63,748 





images of subjects aging between 2 and 90 years. 


The contributions of this paper are: (1) A LeNet50 
inspired sequential algorithm for gender and sequential age 
prediction and (2) An age estimation model with improved 
age group prediction accuracy for children and elders. 
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II. RELATED WORK 


The earlier studies in based upon computer vision would 
mostly require a manual extraction of features from the gait 
descriptor. Zhang et al. [13] proposed a Hidden Markov 
Model based age group classification model. In the feature 
extraction phase, they generated a Frame to Exemplar 
distance vector which contained the distances from multiple 
contour points to the centroid of the contour. They achieved 
an accuracy of 83.33% in bi-class classification of young 
and old subjects over a self-developed 14 subject dataset. 
Mansouri Nabila et al. [14] proposed a novel Gait descriptor 
which captured both Spatiotemporal Transverse and 
Spatiotemporal Longitudinal projections of the gait 
descriptor which was a silhouette in this case. They 
employed the Support Vector Machine (SVM) over the 
4000 subject OU-ISIR database reaching up to a precision 
of about 74%. Xiang Li et al. |15] performed an age-group 
classification of the subjects. They used a directed acyclic 
graph (DAG) for the age group representation and an SVM 
using a Gaussian kernel to do the classification task. They 
achieved an average age group classification accuracy of 
72.23% and an age estimation Mean Absolute Error (MAE) 
of 6.78 years over the OULP-Age Dataset comprising of 
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63846 subjects which included around same number of 
male and female subjects of varying ages between 2 and 90 
years. 


Jiwen Lu et al. [16] used a fusion technique for the 
Gabor feature set like the gait sequence phase and the Gabor 
magnitude for the purpose of feature enhancement. They 
used the USF gait database for the model evaluation and 
achieved a Mean Error Average of 5.42 years. Makihara et 
al. [17] in one of the first studies to use the Gait Energy 
Image descriptor used the Gaussian Regression technique 
to predict the age. For the model training and evaluation, 
they used a self-created gait database with 1728 subjects of 
varying ages between 2 and above 90 years. The best MAE 
they could reach up to was 8.2 years. 


M. Hu et al. [18] presented the intensification of mutual 
information technique using the Gabor filter for feature 
extraction and Bayes Rule based on Hidden Markov Model 
(HMM) for the classification. It performed both gender as 
well as age classification (young and old). The gender 
classification results were evaluated over the CASIA(B) 
dataset and IRIP dataset and the age classification on the 
database used by Zhang et al. [13]. A fresh study by A 
Sakata et al. [19] put forth a deep learning-based model 
which employed multiple Convolutional Neural Networks 
sequentially in the age estimation process. The GEI would 
firstly go through a Convolutional Neural Network which 
predicts its gender and sequentially passes through other 
two CNNs predicting the Age Group and age (achieving an 


Age Group 
Classificatio 


n CNN 
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MEA of 5.84 years). T. Islam et al. [23] presented a 
comprehensive analysis of the related studies in this area 
which apparently turn out to be not many. They compared 
various gait-based age estimation techniques based on 
various evaluation metrics. They found that the Deep 
Learning based studies had achieved best results. 


The efficiency of Deep Learning based techniques 
motivated us to drive our research in the specific direction. 


II. PROPOSED METHOD 


We propose separate models for age and gender 
estimation as depicted in the flowcharts in Figure 3 and 4. 
In the gender estimation process, a heatmap filter is applied 
on the GEI, which is subsequently fed into the CNN which 
predicts the binary gender label. A sequential CNN setup 
first predicts the age group -Toddler (2-5), Child (6-11), 
Adolescent (12-18), Adult (19-60), Old (61-90) and 
subsequently the age classification CNN predicts the age of 
the subject. 







Gender 
Classificati 
on CNN 


—> Gender(M/F) 





Fig 3. Gender Classification Model. 
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Fig 4. Sequential Age Estimation Model. 


3.1. Model Architecture 


The proposed CNN architecture is depicted in figure 5. The 
input GEI of 128x88 pixels, is converted into an heatmap of 
128x88 size. The proposed model includes of two pairs of 
consecutive convolution and max-pooling layers with a 
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dropout probability of 0.5. The first pair has 81 filters of 
(5,5) and (3,3) max-pool filters and the second pair 
comprising of 45 filters of (7,7) and (3,3) max-pool filters. 
A dropout with probability 0.3 is applied beforehand the 
flattening operation. We used three fully connected layers 
with 1024, 256 and 32 nodes with a dropout rate of 0.3. 
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Finally, last layer being the recognition layer has two nodes 
in the gender classification model and for each age group 





Convolution | P | 
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model equal to the number of distinct age values in the 
respective age group. 
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Fig 5. Gait-Net Architecture. 
Table 1. Model Architecture. 
Operation Layer Number of Filters Size 
Input Heat Map - 128 x 88 x 3 
Filtered Image 
Conv2D 81 5x5 
MaxPooling 2D 81 3x3 
Dropout - 
Conv2D 45 7x7 
MaxPooling 2D 45 3x3 
Dropout - 
Flatten 405 x 1 
Dense 1024 x 1 
Dense 256 x 1 
Dense 32x 1 
Output Layer Number of classes 


The dense layers are initialized with the he_normal [20] as 
kernel_initializer and bias initialized to zeros. The relu [21] 
activation function is used in the three dense layers and 
sigmoid for the recognition layer. The architecture and 


www.ijaers.com 


hyperparameter selection were done through a rigorous 


training and testing of multiple architecture — 
hyperparameters combinations and to arrive at the 


combination with best evaluation metrics results. 
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3.2. Model Training 


We propose a two-phase training setup for the CNN. In the 
initial phase the model is compiled using the Adam 
optimizer with the default learning rate of 0.01 and 
categorical_crossentropy as the loss function in the first 
phase. It is trained for twenty epochs on the training set with 
a batch size of 32. The model checkpointing on 
val_accuracy ensures that only the model with best 
validation accuracy is saved. In the second phase the saved 
model is further trained in multiple subphases with iterative 
decreasing learning rate. The subsequent sub phases contain 
three epochs each and the learning rates iteratively 
decreasing by a factor of 10°, 0.8, 0.2, 0.08 respectively. 


IV. EXPERIMENTAL RESULTS AND 
DISCUSSION 


4.1. Dataset 


The model was evaluated on the OU-ISIR Gait 
Database, Large Population Dataset with Age (OULP-Age), 
which is the largest Gait database developed till date. The 
OU-ISIR biometric database is a repository of various Gait 
databases like the Large Population Dataset with Age, 
Population Dataset with Bag, Inertial Sensor Dataset, etc. 
They were developed by capturing the side view video of 
Gait sequence of each subject followed by a three step GEI 
extraction: segmentation, normalization and averaging. The 
Large Population Dataset with Age is a collection of 63846 
GEI images of both male and female with ages between 2 
and 90 years. 
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2000 


1500 


Frequency 


1000 


500 





International Journal of Advanced Engineering Research and Science, 8(8)-2021 


Table 2. Gender wise breakup. 


Male Fema 
31,083 32,453 


Table 3. Age Group wise breakup. 


Age Group No. of Subjects 
Toddler 1573 
Child 9654 
Adolescent 8724 
Adult 42110 


Old 1784 


Table 4. Gender wise Train-Test split. 





Train Test 
Male 7829 7767 
Female 8132 8195 





Table 2 gives the gender wise breakup of the database. The 
training set — test set split is done at 50% (15961 and 15962 
subjects respectively). The split remains the same for all the 
three classifications processes. 


Age Value(2-90) 


Fig 6. Frequency distribution of distinct age values. 
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4.2. Performance Evaluation 


Since the gender estimation is a bi class classification 
problem thus, we require a simple evaluation metric like the 
accuracy, which is given by 


Accuracy = 5 x 100% (1) 


where Nc represents the total number of samples that were 
accurately classified by the model and N gives the total 
sample size. In age estimation problem (a regression 
problem), the frequently used performance evaluation 
metrics are the Mean Absolute Error (MEA) and Standard 
Deviation (SD) which are given by the following equations: 


1 
MEA = N raile Pyl (2) 


Where N is the total sample size of the test set, tx and px are 
the actual and predicted values of age for the x" sample. 


The following formula is used to compute the standard 
deviation: 


SD = Je EX_1(ltx — pxl — MEA)? (3) 


4.3. CNN without heatmap filtered GEI 


[19] used a CNN model with the following architecture: 
Conv1(81,5,5) — 81 filters of size 5x5, Pooll(3,3), 
Conv2(45,7,7), Pool2(2,2), fc3(1024) and fc4(1). The 
model was trained for 20 epochs with model checkpointing 
over val_accuracy on the GEI training set for the gender 
estimation. A batch size of 128 and Adam optimizer with 
default learning rate of 0.001 were used. The model 
achieved the classification accuracy of 98.28 % over the 
training set and 96.03% over the test set. The classification 
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results over the training set and test set are described in table 
5 and 6. 


Table 6. Gender prediction results on test set. 


Predicted Label 
Male Female 
Male 15701 672 16373 


Female 294 15159 15453 


16095 L583] Total 


True Label 


Table 5. Gender prediction results on training set 





Predicted Label 
Male Female 
= Male 16174 152 











16326 
Female 394 15202 15596 





True Label 





16565 15354 Total 





The same CNN architecture when employed for age group 
estimation yielded the following classification results as 
shown in Table 7 and 8. 


4.4. GaitNet with heatmap filtered GEI: 


We improved the gender and age group classification 
accuracy of the conventional single dense layer CNN with 
our proposed GaitNet. The training set and test set gender 
classification accuracy reached up to 99.03% and 96.96% 
respectively. Table 9 and 10 depict the gender classification 
results of GaitNet with heatmap filtered GEI over training 
and test set respectively. Table 11 and 12 present the 
confusion matrix for age group prediction over the training 
set and test set respectively. 


Table 7. Age group prediction results on training set. 


Toddler Child 
Toddler 692 137 
T Child 157 4173 
E Adolescent 0 623 
E Adult 2 74 
Old 0 14 
851 4884 
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Predicted Label 
Adolescent Adult Old 
0 0 0 829 
305 130 2 4740 
2569 1171 10 4373 
647 20291 130 21144 
32 455 336 837 
3553 22047 478 Total 
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Table 8. Age group prediction results on test set. 


























Predicted Label 
Toddler Child Adolescent Adult Old 
Toddler 327 215 1 1 0 744 
3 Child 238 4140 388 147 1 4913 
N Adolescent 5 758 2266 1310 12 4350 
= Adult 103 819 19872 168 20966 
Old 35 51 574 287 947 
775 5251 3525 21904 468 Total 
Table 9. Gender prediction results on training set. 
Predicted Label 
Male Female 
2 Male 16101 225 16326 
© 
= Female 84 15512 15596 
E 16185 15737 Total 
Table 10. Gender prediction results on test set. 
Predicted Label 
Male Female 
3 Male 15877 496 16373 
7 Female 767 14686 15453 
E 16644 15182 Total 
Table 11. Age group prediction results on training set. 
Predicted Label 
Toddler Child Adolescent Adult Old 
Toddler 619 210 0 0 0 829 
T Child 37 4137 523 18 3 4740 
E Adolescent 0 280 3359 718 16 4373 
E Adult 1 34 845 20206 58 21144 
Old 0 2 23 500 312 837 
677 4663 4752 21442 389 Total 
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Table 12. Age group prediction results on test set. 


Predicted Label 
Toddler Child Adolescent Adult Old 

Toddler 469 275 0 0 0 744 

2 Child 131 3939 748 85 10 4913 
z Adolescent 2 469 2871 990 19 4350 

5 

ia Adult 0 66 1122 19700 78 20966 

Old 0 8 32 629 258 947 

602 4757 4793 21404 365 Total 


4.5. GaitNet with image augmentation: 


The inherent class imbalance problem in the OULP dataset 
induces lower classification accuracy for toddler, child and 
old age groups. We used image augmentation technique 


transformations on the enlarged dataset. The size of the 
augmented dataset increased up to 76,667 subjects. We can 
use the adding of lower levels of curated noise for data 
augmentation such that the new images generated could 


over these two age groups to alleviate the problem. To 
increase the number of subjects in the scarce age groups we 
cloned the GEI’s in the training set of toddler, child and old 
age groups. We employed the width_shift and height_shift 


preserve the discriminating features and at the same time 
generate new subjects. Table 13 shows the classification 
results of GaitNet trained over the augmented training set 
and table 14 describes the test set classification results. 


Table 13. Age group prediction results on training set. 


Predicted Label 
Toddler Child Adolescent Adult Old 
Toddler 829 0 0 0 0 829 
3 Child I7 4608 114 1 0 4740 
S Adolescent 0 33 4276 63 i 4373 
E Adult 0 9 847 20134 1 20991 
Old 0 0 0 0 837 837 
846 4650 5237 20198 839 Total 
Table 14. Age group prediction results on test set. 
Predicted Label 
Toddler Child Adolescent Adult Old 
Toddler 546 198 0 0 0 744 
T Child 294 3812 724 70 13 4913 
X Adolescent 3 622 2934 760 32 435] 
E Adult 3 65 1801 18593 504 20966 
Old 0 8 44 409 456 947 
846 4705 5503 19927 1035 Total 
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4.6. GaitNet with heatmap filtered GEI: 


We improved the gender and age group classification 
accuracy of the conventional single dense layer CNN with 
our proposed GaitNet. The training set and test set gender 
classification accuracy reached up to 99.03% and 96.96% 
respectively. Table 9 and 10 depict the gender classification 
results of GaitNet with heatmap filtered GEI over training 
and test set respectively. Table 11 and 12 present the 
confusion matrix for age group prediction over the training 
set and test set respectively. 


Although, the effect of gender on the age estimation has 
been established by various studies in the past and this fact 
was utilized by previous studies. The improvement in the 
age estimation results by the sequential models concretes 
this fact. Using a gender separated sequential model using 
our model yields detrimental results. These results are 
ensured by decrease in the training data as gender separation 
divides the dataset into two subsets of approximately half 
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the size of the mixed gender dataset. Thus, training a deeper 
CNN on lesser data ended up overfitting the result thereby 
degrading the test set results. 


4.7. Age estimation results using the GaitNet: 


The GaitNet employs a sequential process for the age 
estimation. First the age group of the subject is predicted 
followed by age estimation using a sequential CNN trained 
over the predicted age group. Instead of addressing the age 
estimation problem as a regression problem we considered 
it as a classification problem. So, the number of nodes in the 
final layer of GaitNet are equal to the number of distinct age 
values in the respective age group, e.g., the CNN for age 
prediction for toddler age group has 4 nodes as it has age 
values 2-5 years. Table 14 gives the comparative 
description of our method with the existing methods in the 
literature using the MEA and SD evaluation metrics. Figure 
7 plots the true age values against the predicted values. 


Table 15. Comparison of our GaitNet with previous method on the basis of MEA and SD. 


Method 
Parallel multi-CNN 
Sequential multi-CNN [19 ] 
Makihara et al. [ 17 ] 
Guo et al. [ 22] 
GaitNet (Ours) 


V. CONCLUSION AND FUTURE SCOPE 


In this paper we proposed GaitNet to improve the Gait- 
based age estimation accuracy and it outperformed all other 
existing Gait-based age estimation models. The heatmap 
filter assisted in making it possible to train a deeper CNN 
by increasing the number of distinguishing features in the 
GEI. The image augmentation technique alleviated the 
implications of data scarcity of elder and child subjects. 
Though we addressed most of the shortcomings of the 
existing models but some still continue to prevail like the 
sequential nature of the age prediction process connotes that 
wrong age group prediction for a subject would 
subsequently degrade its age prediction. 


In future the work can be extended by further feature 
enhancement of the GEI Gait descriptor eventually making 
employing a deeper CNN a possibility. The development of 
OULP Gait database by addition of more subjects in the 
elder and children age group would further enhance the age 
group prediction accuracy ultimately improving the age 
estimation accuracy. 
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MEA(Years) SD(Years) 
5.84 6.5 
023 6.61 
7.30 6.64 
7.66 7.10 
5.08 4.29 
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